First Steps with Stan

Maud goes to Vegas

Andrew B. Collier

andrew@exegetic.biz | DataWookie | DataWookie

eRum (Budapest)

15 May 2018 at 16:00

Meet Maud

description symbols payout
0
1x mouse 🐭 1
1x cat 🐱 2
2x mouse 🐭🐭 5
2x cat 🐱🐱 10
3x mouse 🐭🐭🐭 20
3x cat 🐱🐱🐱 100

Maud’s Burning Questions

  • What is the hit rate?
  • What is the RTP?
  • What is the hit rate for each winning combination?

Maud’s Data (by Session)

sessions
# A tibble: 100 x 8
   session               details spins  hits wager payout  hit_rate       rtp
     <int>                <list> <int> <int> <int>  <dbl>     <dbl>     <dbl>
 1       1  <data.frame [7 x 3]>     7     2    10      3 0.2857143 0.3000000
 2       2 <data.frame [19 x 3]>    19     7    30     29 0.3684211 0.9666667
 3       3 <data.frame [19 x 3]>    19     3    22      3 0.1578947 0.1363636
 4       4 <data.frame [26 x 3]>    26     7    30     13 0.2692308 0.4333333
 5       5 <data.frame [23 x 3]>    23     8    31     35 0.3478261 1.1290323
 6       6 <data.frame [20 x 3]>    20     8    26     12 0.4000000 0.4615385
 7       7 <data.frame [22 x 3]>    22     5    30     20 0.2272727 0.6666667
 8       8 <data.frame [22 x 3]>    22     4    25     10 0.1818182 0.4000000
 9       9 <data.frame [18 x 3]>    18     4    26      6 0.2222222 0.2307692
10      10 <data.frame [26 x 3]>    26     8    33     75 0.3076923 2.2727273
# ... with 90 more rows

Maud’s Data (by Spin)

spin <- sessions %>%
  select(session, details) %>%
  unnest() %>%
  mutate(success = as.integer(payout > 0))
# A tibble: 1,972 x 5
   session  spin wager payout success
     <int> <int> <int>  <dbl>   <int>
 1       1     1     1      1       1
 2       1     2     1      0       0
 3       1     3     1      0       0
 4       1     4     3      0       0
 5       1     5     2      0       0
 6       1     6     1      2       1
 7       1     7     1      0       0
 8       2     1     2      0       0
 9       2     2     2      0       0
10       2     3     1      1       1
# ... with 1,962 more rows

Q1: Hit Rate

Probability distribution for binomial process:

\[ P(k | n, \theta) = \binom{n}{k} \theta^k (1 - \theta)^{n - k} \]

where

  • \(k\) successes in \(n\) trials; and
  • the probability of success on any trial is \(\theta\).

Multiple experiments: for session \(i\) there are \(k_i\) hits from \(n_i\) spins.

Probability distribution for Bernoulli process:

\[ P(k | \theta) = \theta^k (1 - \theta)^{1 - k} \]

where

  • \(k = 0\) indicates failure;
  • \(k = 1\) indicates success; and
  • the probability of success on any trial is \(\theta\).

Bayes’ Theorem

\[ p(\theta|y, X) = \frac{p(y|X, \theta) p(\theta)}{p(y)} \propto p(y|\theta) p(\theta) \] where

  • \(y\) are observations;
  • \(X\) are predictors;
  • prior — \(p(\theta)\);
  • likelihood — \(p(y|X, \theta)\); and
  • posterior — \(p(\theta|y, X)\).

Stan:

RStan:

Stan Workflow

  1. Choose a model.
  2. Write Stan program (likelihood and priors).
  3. Stan parser converts this to C++.
  4. Compile C++.
  5. Execute compiled binary.
  6. Evaluate results. Optionally return to 2.
  7. Inference based on posterior sample.

To use rstan you need a .stan and a .R .

Stan Skeleton

data {
  int<lower = 0> N;
  int<lower = 0, upper = 1> y[N];
}
parameters {
  real<lower = 0, upper = 1> theta;
}
model {
  y ~ bernoulli(theta);  
}

library(rstan)

# Use all available cores.
#
options(mc.cores = parallel::detectCores())

trials <- list(
  N       = nrow(spin),
  y       = spin$success
)

fit <- stan(
  file    = "bernoulli.stan",
  data    = trials,
  chains  = 2,                         # Number of Markov chains
  warmup  = 500,                       # Warmup iterations per chain
  iter    = 1000                       # Total iterations per chain
)

class(fit)
[1] "stanfit"
attr(,"package")
[1] "rstan"
samples <- as.data.frame(fit)

head(samples)
      theta      lp__
1 0.3095256 -1226.968
2 0.3072672 -1227.066
3 0.3124851 -1226.912
4 0.3200480 -1227.132
5 0.3140362 -1226.914
6 0.2929604 -1228.812
nrow(samples)
[1] 1000

print(fit, probs = c(0.025, 0.5, 0.975))
Inference for Stan model: bernoulli.
2 chains, each with iter=1000; warmup=500; thin=1; 
post-warmup draws per chain=500, total post-warmup draws=1000.

          mean se_mean   sd     2.5%      50%    97.5% n_eff Rhat
theta     0.31    0.00 0.01     0.29     0.31     0.33   413    1
lp__  -1227.43    0.04 0.70 -1229.56 -1227.14 -1226.91   348    1

Samples were drawn using NUTS(diag_e) at Tue May 15 17:45:07 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Use summary() to get information for each chain.

data {
  int<lower=0> N;
  int hits[N];
  int spins[N];
}
parameters {
  real<lower=0,upper=1> theta;
}
model {
  hits ~ binomial(spins, theta);       // Likelihood
  theta ~ beta(2, 2);                  // Prior
}

print(fit, probs = c(0.025, 0.5, 0.975))
Inference for Stan model: binomial-beta-prior.
2 chains, each with iter=1000; warmup=500; thin=1; 
post-warmup draws per chain=500, total post-warmup draws=1000.

          mean se_mean   sd     2.5%      50%    97.5% n_eff Rhat
theta     0.31    0.00 0.01     0.29     0.31     0.33   280    1
lp__  -1228.96    0.03 0.75 -1231.15 -1228.66 -1228.45   545    1

Samples were drawn using NUTS(diag_e) at Tue May 15 17:46:00 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Q2: RTP

data {
  int<lower=0> N;
  real rtp[N];
}
parameters {
  real<lower = 0> mu;
  real<lower = 0> sigma;
}
model {
  rtp ~ lognormal(log(mu) - sigma * sigma / 2, sigma);
}
generated quantities {
  real simulated[25];
  for (i in 1:25) {
    simulated[i] = lognormal_rng(log(mu) - sigma * sigma / 2, sigma);
  }
}

Inference for Stan model: lognormal-rtp.
4 chains, each with iter=2000; warmup=1000; thin=1; 
post-warmup draws per chain=1000, total post-warmup draws=4000.

        mean se_mean   sd   2.5%    25%    50%    75%  97.5% n_eff Rhat
mu      0.80    0.00 0.07   0.69   0.76   0.80   0.84   0.94  1901    1
sigma   0.72    0.00 0.05   0.62   0.68   0.71   0.75   0.83  1852    1
lp__  -16.12    0.02 1.01 -18.77 -16.48 -15.81 -15.41 -15.16  1759    1

Samples were drawn using NUTS(diag_e) at Tue May 15 17:46:53 2018.
For each parameter, n_eff is a crude measure of effective sample size,
and Rhat is the potential scale reduction factor on split chains (at 
convergence, Rhat=1).

Here’s the Kicker

What’s the probability of breaking even?

mean(simulated$rtp > 1)
[1] 0.25117

What’s the probability of doubling your money?

mean(simulated$rtp > 2)
[1] 0.05319

Q3: Hit Rate per Combination

description symbols payout
0
1x mouse 🐭 1
1x cat 🐱 2
2x mouse 🐭🐭 5
2x cat 🐱🐱 10
3x mouse 🐭🐭🐭 20
3x cat 🐱🐱🐱 100
data {
  int<lower=0> N;
  real rtp[N];
}
parameters {
  real<lower=0, upper=1> theta[6];
  real<lower=0> sigma;
}
transformed parameters {
  real<lower=0> mu;
  mu = theta[1] * 1 +                  // Payline 1 -> 1x
       theta[2] * 2 +                  // Payline 2 -> 2x
       theta[3] * 5 +                  // Payline 3 -> 5x
       theta[4] * 10 +                 // Payline 4 -> 10x
       theta[5] * 20 +                 // Payline 5 -> 20x
       theta[6] * 100;                 // Payline 6 -> 100x
}
model {
  rtp ~ lognormal(log(mu) - sigma * sigma / 2, sigma);
  theta[1] ~ beta(2, 2);               // Mode = 0.5
  theta[2] ~ beta(2, 4);               // Mode = 0.25
  theta[3] ~ beta(2, 5);               // Mode = 0.2
  theta[4] ~ beta(2, 8);               // Mode = 0.125
  theta[5] ~ beta(2, 10);              // Mode = 0.1
  theta[6] ~ beta(2, 16);              // Mode = 0.0625
}

                mean      se_mean           sd         2.5%       97.5%    n_eff      Rhat
theta[1] 0.140072477 1.445106e-03 0.0886783678 0.0186286589 0.352326667 3765.615 0.9993099
theta[2] 0.068064522 6.711297e-04 0.0424459690 0.0102927430 0.168731470 4000.000 1.0001725
theta[3] 0.028892776 2.894141e-04 0.0183041579 0.0033953293 0.072914250 4000.000 0.9998855
theta[4] 0.014594931 1.437237e-04 0.0090898869 0.0019018620 0.036523596 4000.000 0.9999596
theta[5] 0.007441395 7.525103e-05 0.0047592927 0.0009537422 0.019112895 4000.000 1.0000193
theta[6] 0.001517744 1.482361e-05 0.0009375275 0.0002257116 0.003729426 4000.000 1.0005894

The 1x payout is triggered on average every 7 spins.

The 100x payout is triggered on average only every 659 spins.

Exit Maud!

Why Stan?

  • Extract maximum information from your data.
  • Learning curve is not as steep as you might think.

Do
cool things
with
Stan!